Merged
Conversation
Collaborator
Author
|
The results don't look like the old vLLM: GSPO (blue=v0)
GRPO (orange=v0)
A main difference between vLLM v0 and v1 is that in v0, new requests will get blocked until weight update request fulfilled and in v1, new requests will go ahead (with a mix of old/new weights) during a weight update. The logprobs are the same at the beginning but start to diverge (blue and green are v0):
Leave this PR open for now! And instead, upgrade vllm to a recent version that uses v0, see #122. |
Merged
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.









This PR upgrades vLLM from
0.8.5.post1to0.11.2. Other notable upgrades as a result of this change is torch upgraded to2.9.0, transformers to4.57.xand flash-attention to2.8.3The vLLM upgrade is needed for Apriel multi-modal training (#111), using new tool parsers, and newer models.
For weight updates in vLLM v1, I followed https://github.com/vllm-project/vllm/blob/v0.11.2/examples/offline_inference/rlhf_utils.py.
Also found a similar code in TRL.